International
Workshop on Spoken Language Translation scheduled on 30th
September – 1st October 2004, in Kyoto Japan.
Spoken language translation technologies attempt to cross the language
barriers between people with different native languages who each want
to engage in conversation by using their mother-tongue. The importance
of these technologies is increasing because there are many more
opportunities for cross-language communication in face-to-face and
telephone conversation, especially in the domain of tourism.
Novel technologies have been proposed to tackle the problems in spoken
language translation research. A number of institutes are developing
huge bilingual or multilingual spoken language corpora. MT
technologies based on machine learning, such as statistical MT and
example-based MT, are being applied to the translation of spoken
language by using these corpora. Some of the characteristics of spoken
language seem suitable for the application of machine-learning-based
MT in comparison with written language. However, there is still no
concrete standard methodology for comparing the translation qualities
of spoken language translation systems.
One of the prominent research activities in spoken language
translation is the work being conducted by the Consortium for Speech
Translation Advanced Research (C-STAR
III), which is an international partnership of research
laboratories engaged in automatic translation of spoken language.
Current members include ATR
(Japan), CAS
(China),
CLIPS (France), CMU
(USA), ETRI (Korea),
ITC-irst (Italy), and
UKA (Germany). One of C-STAR's
ongoing projects is the joint development of a speech corpus that
handles a common task in multiple languages. The creation of such a
corpus will not only enable translation among multiple languages but
will also facilitate exchange and discussion of research results among
member labs. As a first result of this activity, a Japanese-English
speech corpus comprising tourism-related sentences, originally
compiled by ATR, has been translated into the native language of
C-STAR members.
In this workshop, an "evaluation campaign" of spoken language
translation technologies will be held by using the multilingual speech
corpus containing the tourism-related sentences developed by ATR and
C-STAR members. Two types of submissions are invited: 1) participants
in the evaluation campaign of spoken language translation
technologies; and 2) technical papers on related issues. An overview
of the evaluation campaign is as follows:
Main Theme:
Evaluation of spoken language translation systems
Corpus used for the
evaluation campaign:
-
Basic Travel
Expression Corpus (BTEC)
-
Languages:
Chinese-English, Japanese-English
-
Domain:
tourism-related sentences
-
Media: text in
utterance style
-
Number of
Sentence Pairs: 20,000 for each translation direction
Tracks of the Evaluation Campaign:
-
Translation
Directions:
-
Chinese to
English
-
Japanese to
English
-
Resources Used:
-
Supplied corpus
only (C-to-E, J-to-E)
-
Supplied corpus
+ additional linguistic resources available from
LDC (C-to-E)
-
Unrestricted
(C-to-E, J-to-E)
Evaluation Methodology of Translated Results
The workshop also invites technical papers related to spoken language
translation. Possible topics for the session include, but are not
limited to:
For more details of
the event visit the official website:
http://www.slt.atr.jp/IWSLT2004/ |